Towards a Finite-State Parser for Swedish
نویسندگان
چکیده
In this study, we describe a method for parsing part-of-speech tagged unrestricted texts in Swedish using finite-state networks. We use the Xerox Finite-State Tool because of its expressiveness and power for writing and compiling regular expressions and relations. The parser is divided into four modules: i) contiguous phrase structure marker, ii) phrasal head marker, iii) syntactic function tagger, and iv) noncontiguous group boundary recognizer. The aim is to develop a parser that can be used as a light/shallow parser for marking phrase structure and, when needed, to label syntactic functions. We believe that modularity is necessary since different NLP tasks require various levels of analysis. The parser for Swedish is under development, but present-day results are promising.
منابع مشابه
A Cascaded Finite-State Parser for Syntactic Analysis of Swedish
This report describes the development of a parsing system for written Swedish and is focused on a grammar, the main component of the system, semiautomatically extracted from corpora. A cascaded, finite-state algorithm is applied to the grammar in which the input contains coarse-grained semantic class information, and the output produced reflects not only the syntactic structure of the input, bu...
متن کاملFinite matters Verbal features in data-driven parsing of Swedish
This paper investigates the effect of a set of verbal features in datadriven dependency parsing of Swedish. Following an error analysis of a baseline parser, we show that the addition of information on verbal features such as tense and voice can give significant improvements over this baseline and, in particular, in the analysis of syntactic arguments. We furthermore show the importance of the ...
متن کاملCollection, Encoding and Linguistic Processing of a Swedish Medical Corpus - The MEDLEX Experience
Corpora annotated with structural and linguistic characteristics play a major role in nearly every area of language processing. During recent years a number of corpora and large data sets became known and available to research even in specialized fields such as medicine, but still however, targeted predominantly for the English language. This paper provides a description of the collection, enco...
متن کاملModularisation of Finnish Finite-State Language Description - Towards Wide Collaboration in Open Source Development of a Morphological Analyser
In this paper we present an open source implementation for Finnish morphological parser. We shortly evaluate it against contemporary criticism towards monolithic and unmaintainable finite-state language description. We use it to demonstrate way of writing finite-state language description that is used for varying set of projects, that typically need morphological analyser, such as POS tagging, ...
متن کاملComparative Study of GLR Parser with Finite-state Predictors and Chart-based Semantic Parsers
The natural language processing component of a speech understanding system is commonly a robust, semantic parser, implemented as either a chart-based transition network, or as a generalized left right (GLR) parser. In contrast, we are developing a robust, semantic parser that is a single, predictive finite-state machine. Our approach is motivated by our belief that such a finite-state parser ca...
متن کامل